WebAssembly SIMD

WebAssembly SIMD - Chrome Platform Status

chrome 84でorigin trial -> 91でshipping

Fast, parallel applications with WebAssembly SIMD · V8

wasm-feature-detectでチェックする

emcc -msimd128 -O3 foo.c -o foo.js

-msimd128

These optimizations can automatically transform loops that perform arithmetic operations on each iteration into equivalent loops that perform the same arithmetic operations on multiple inputs at a time using SIMD instructions

-msimd128をつけると、LLVM autovectorizersが-O2, -O3でデフォルトで有効になる

しかし、ベタに書いたコードでは最適な結果にならないので、

wasm_simd128.hで、Intrinsicsを使って期待通りコンパイルされるようにしないといけない

code:cpp

#include <wasm_simd128.h>

void multiply_arrays(int* out, int* in_a, int* in_b, int size) {

for (int i = 0; i < size; i += 4) {

v128_t a = wasm_v128_load(&in_ai);

v128_t b = wasm_v128_load(&in_bi);

v128_t prod = wasm_i32x4_mul(a, b);

wasm_v128_store(&outi, prod);

}

XNNPackをみると、wasm向けにこれを書いている

https://github.com/google/XNNPACK/blob/33b4f751f4f69a4593de34b865f9dca45f7f4c16/src/math/roundne-wasmsimd-native.c

roundne-{neon, scalar, sse, sse2, sse41, wasmdimd} それぞれ実装がある

Rust

RUSTFLAGS="-C target-feature=+simd128" cargo build

iter.for_ezchを使うと最適な結果が得られる。C++のようにlow levelに書くこともできる

https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md

The motivation for this proposal is to introduce WebAssembly operations that map to commonly available SIMD instructions in hardware.

WebAssembly is extended with a new v128 value type and a number of new kinds of immediate operands used by the SIMD instructions.

f32x4, i16x8

https://github.com/WebAssembly/simd/blob/master/proposals/simd/W3CTAG-SIMDExplainer.md

512bitまでサポートするHWあるが、128bitが一般的。ポータブルにするためにこれを標準化

どれをサポートするか？

https://v8.dev/features/simd

複数のmodern architecturesでサポートされている

パフォーマンスが向上すること

performance cliffsがある場合、最小化すること

widely used SIMD opsをmodern hardwareに近い形でマッピング

Intel, Armv7, v8のporable subset。SIMD.jsをベースに

simd/WebAssembly-SIMD-May-2017.pdf at master · WebAssembly/simd

SIMD has driven large speed ups in certain cases such as image manipulation, video encoding/decoding, machine learning, game engines and physics engines etc - with some of these use-cases not being usable without SIMD support, making SIMD support for the web platform essential for achieving near-native speed with certain native applications.

画像処理とかはこれがないと実現が難しい

JSからは、memoryを直接操作できる

がv128の型自体はjSから扱えない。TypeError

opcode

https://github.com/WebAssembly/simd/blob/master/proposals/simd/BinarySIMD.md

Example

https://github.com/WebAssembly/simd/blob/master/test/core/simd/simd_i8x16_arith.wast

Constructing SIMD values

v128.const(imm: ImmByte[16]) -> v128

16個のByte列からv128型にする

v128.const i8x16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

i8x16.splat(x: i32) -> v128

i32をコピーしてi8x16のv128にする

Accessing lanes

i8x16.extract_lane_s(a: v128, imm: ImmLaneIdx16) -> i32

code:py

def S.extract_lane(a, i):

return ai

arithmetic

add, sub, dot, ...

max, avg, abs

bit

shift

and, or

popcount

comparisions

eq, ne, lt

load and store

v128.load(m: memarg) -> v128

https://webassembly.github.io/spec/core/bikeshed/index.html#syntax-memarg

memarg = {offset, align}

16byte loadする